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The  objective  of  this  project  is  to  test  the  hypothesis  that  the  complex  proteins  with  novel  functions 
arose  in  the  course  of  evolution  by  combining  structural  domains  having  partial  functions.  The  research 
focuses  on  the  enzymes  that  catalyze  dc  novo  pyrimidine  biosynthesis.  While  the  reactions  arc  the  same  in 
most  organisms,  there  are  striking  differences  in  the  structure  and  regulation  of  these  enzymes.  This  year 
we  completed  the  sequence  of  the  mammalian  protein  CAD,  a  243  kDa  polypeptide  which  carries 
glutamine  dependent  (GLN)  carbanryl  phosphate  synthetase  (CPS),  aspartate  transcarbamylase  (ATC)  and 
dihydroorotase  (DHO)  activities.  Phylogenetic  analysis  suggests  that  the  mammalian  chimeric  protein  was 
fonned  by  stepwise  translocation  and  fusion  of  ancestral  genes  that  occurred  prior  to  the  major  radiation 
that  lead  to  fungi,  plants  and  animals.  The  sequence  divergence  suggests  that  the  fused  and 
monofunctional  DHO  domains  have  a  different  evolutionary  history.  In  contrast,  sequence  comparisons 
and  molecular  modeling  shows  that  the  ATC  domain  is  a  trimer  of  34  kDa  domains  that  has  been  highly 
conserved  thoughout  the  course  of  evolution.  This  conclusion  conflicts  with  the  structural  studies  of  the 
prokaryotic  class  C  ATCascs  which  were  thought  to  have  a  radically  different  structural  organization. 
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19.  ABSTRACT  (coni) 

This  apparent,  conlradiction  was  resolved  by  reexamining  ihc  subunit  structure  of  P.  Jluorcaccns 
ATCase.  While  much  larger  that  other  prokaryotic  ATCascs,  this  class  C  enzyme  consists  of  six 
copies  of  a  34  kDa  catalytic  polypeptide  and  six  copies  of  a  45  kDa  polypeptide  that  probably 
mediates  regulation.  Since  the  major  translocations  and  fusions  appear  to  predate  the  emergence  of 
eukaryotic  organisms,  we  have  begun  to  examine  the  structural  organization  of  the  pyrimidine 
specific  genes  in  archcbaclcria.  A  separate  pyrimidine  specific  CPSase  was  cloned  from 
Methanosarcina  barkeri.  The  gene  duplication  leading  to  separate  CPSases  for  pyrimidine  and 
arginine  pathways  has  occurred  in  this  archebactcria  but  the  genes  encoding  GLN  and  CPS  domains 
are  not  fused. 
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ANNUAL  REPORT  ON  CONTRACT  N00014-87-K-0081  R&T  CODE  4413031 

PRINCIPAL  INVESTIGATOR:  David  R.  Evans 

CONTRACTOR:  Wayne  State  University,  Detroit,  Michigan 

CONTRACT  TITLE:  The  Evolution  and  Analysis  of  the  Functional  Domains  of  the  Chimeric 
Proteins  that  Initiate  Pyrimidine  Biosynthesis. 


PROJECT  PERIOD:  August  1,  198S  -  August  31,  1989 


RESEARCH  OBJECTIVE:  To  determine  the  structural  organization  and  trace  the 
evolutionacry  development  of  the  complex  multi-domain  proteins  involved  in  de  novo 
pyrimidine  biosynthesis.  The  enzymes  which  catalyze  the  first  three  steps  in  the  pathway, 
glutamine  dependent  (GLN)  carbamyl  phosphate  synthetase  (CPS),  aspartate  transcarbamylase 
(ATC)  and  dihydroorotase  (DHO)  arc  separate  proteins  in  eubacteria,  but  arc  consolidated  in  a 
single  243  kDa  chimeric  plypeptide  in  mammals  and  other  higher  eukaryotes.  We  have  shown 
previously  that  the  polypeptide  is  organized  into  at  least  five  functional  domains, 


GLN  CPS  A  CPS  B  DHO  ATC 
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PROGRESS  REPORT. 


Origin  of  the  Trifunctional  Polypeptide 

During  the  last  year  we  used  primer  extension  to  construct  a  cDNA  molecule  (Bcin, 
Simmer  and  Evans,  manuscript  in  preparation)  corresponding  to  the  5’  end  of  the  CAD  mRNA. 
This  clone  allowed  us  to  complete  the  sequence  determination  of  the  entire  CAD  protein.  We 
are  now  analyzing  the  phylogenetic  relationship  of  each  of  the  domains  using  sequence  data  of 
several  prokaryotic  and  eukarotic  enzymes  including  the  hamster  protein  (CAD)  and  partial 
sequence  data  for  the  pyrimidine  biosynthetic  complex  of  Dictyostelium  discoidcum  (DIC), 
published  this  year  (Faure  ct  al.,  Eur.  J.  Biochem.  179:  345,  1989). 


While  the  GLN,  CPS  and  ATC  domains  arc  clearly  homologous  (approximately  50% 
sequence  identify)  to  the  corresponding  monofunctional  E.  coli  enzymes,  the  mammalian 
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DHO  domain  Is  strikingly  different  than  its 
bacterial  counterpart  (Simmer  et  al.,  PNAS,  in 
press)  .  Another  surprising  discovery,  was  that  the 
mammalian  DHO  domain  is  homologous  to  the  the 
long  interdomain  linker  region  (Y-L)  that  connects 
the  CPS  and  ATC  domains  in  yeast  but  very 
different  than  the  functional  yeast  DHOase  (YST) 
which  is  encoded  by  a  separate  gene.  The 
phylogenetic  analysis  shows  that  the  mono¬ 
functional  and  fused  DHOascs  have  a  different 
evolutionary  history  since  the  dendrogram  docs  not 
conform  to  the  accepted  phylogeny  of  the 
organisms  represented.  Although  the 
DicLyosLelium  lineage  predates  the  major  radiation 
that  lead  to  fungi,  plants  and  animals,  the 
dendrogram  clusters  the  enzyme  from 
DicLyosLelium  with  higher  eukaryotes  and  the  yeast 
'»nzvmes  with  the  prokaryotic  DHOascs. 

The  disparate  sequences  of  the  two  class  of  DHOascs  can  be  plausibly  explained  by 
divergent  evolution  following  duplication  of  an  ancestral  gene  (scheme  shown  below). 

According  to  this  model,  the  fusion  of  CPSase  and  ATCase  genes,  separated  by  an 
approximately  300  bp  spacer,  occurred  sometime  between  the  divergence  of  bacteria  and  slime 
molds.  This  early  event  was  followed  by  duplication  of  a  monofunctional  DHOase  gene,  one 
copy  of  which  was  translocated  and  inserted  into  the  spacer  region.  Perhaps  initially  non¬ 
functional  ("DHO"),  reactivation  (DHO)  in  the  Diciyosielium  and  metazoan  lineages,  with  the 
concomitant  advantages  of  coordinate  regulation,  lead  to  the  extinction  of  the  monofunctional 
dihydroorotase.  Reactivation  did  not  occur  in  yeast  and  the  separate  monofunctional  DHOase 
was  preserved. 


ORIGIN  OF  THE  PYRuvtlDiNE  BIOSYNTHETIC  COMPLEXES 


If  this  explanation  (developed  more  fully  In  Simmer  et  al.,  PNAS  in  press)  is  correct  all  of 
the  dihydroorotases  are  descendants  of  a  common  ancestor  and  the  sequence  differences 
between  the  two  families  is  a  consequence  of  differences  in  structural  constraints  imposed  on 
the  fused  and  monofunctional  dihydroorotases. 
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In  contrast,  the  sequence  and  structure  of  the  fused  and  monofunctional  aspartate 
transcarbylase  domains  are  highly  homologous.  We  built  an  energy  minimized  model  of  the 
mammalian  ATCase  domain  using  the  x-ray  coordinates  of  the  E.  coli  enzyme  as  a  tertiary 
template  (Scully  and  Evans,  manuscript  in  preparation).  Favorable  hydrophobic  interactions,  a 
compact  globular  shape  and  a  normal  distribution  of  hvdrophobic  and  hydrophillic  side  chains 
suggests  that  the  model  is  a  plausible  representation  of  the  structure  of  the  mammalian 
ATCase  domain.  The  backbone  carbonyl  carbons  are  nearly  supcrimposable,  the  active  site; 
regions  are  virtually  identical  and  the  trimeric  subunit  contacts  are  similar. 

The  following  conclusions  are  consistent  with  these  studies. 

1.  Evolution  of  the  multidomain  occurred  by  stepwise  translocation  and  fusion  of  ancestral 
genes  coding  for  monofunctional  proteins. 

2.  The  formation  of  a  large  chimeric  protein  was  an  early  event  which  predates  the  radiation 
which  lead  to  the  major  families  of  eukaryotic  organisms. 

3.  The  sequence  of  most  domains  has  been  highly  conserved  (eg.  ATCase)  while  the  DHO 
domain  has  undergone  much  more  extensive  divergence  perhaps  indicative  of  adaptive 
changes. 

The  major  gene  fusions  and  translocations  appeared  to  have  occurred  much  earlier  in 
the  course  of  evolution  than  was  generally  believed  when  we  begun  this  study.  Thus  we  are 
now  focusing  our  attention  on  more  primitive  organisms,  the  bacteria  and  archcbacteria. 


Prokaryotic  Class  A  Aspartate  Transcarbamylases 

Three  classes  of  ATCase,  which  differ  in  strucutral  organization  and  regulatory 
properties,  have  been  identified  in  eubacteria.  £.  coli  aspartate  transcarbamylase,  a  well 
characterized  class  B  enzyme,  is  a  dodecamer  of  two  catalytic  subunits  and  three  regulatory 
subunits.  The  catalytic  subunit,  a  trinrer  of  34  kDa  catalytic  chains,  is  catalytically  active  but 
is  not  regulated,  while  the  regulatory  subunit,  a  dimer  of  17  kDa  regulatory  chains,  binds 
allosteric  effectors  but  is  inactive.  B.  sublilis  ATCase,  a  typical  class  C  enzyme,  is  an  un- 
unregulated  trimer  of  identical  34  kDa  catalytic  chains.  The  structure  and  regulation  of  the 
class  A  enzymes,  the  largest  aspartate  transcarbamylases,  are  not  well  understood. 

Our  studies  have  focused  on  P.  Jluoresccns  ATCase,  a  class  A  ATCase  reported  to  be  a 
dimer  of  two  identical  180  kDa  subunits.  The  enzyme  is  inhibited  by  broad  range  of 
nucleotides  but  does  not  exhibit  cooperative  substrate  binding  and  thus  apparently  has  a  less 
sophisticated  mode  of  regulation  than  the  E.  coli  enzyme. 

The  large  size  of  the  molecule  raised  the  possibility  that  the  P.  Jluoresccns  enzyme  was  a 
multidomain  protein  analogous  to  the  mammalian  complex.  To  determine  whether  cither  of 
CPS  or  DHO  activities  are  associated  with  P.  Jluoresccns  ATCase,  cell  extracts  were  fractionated 
on  Sephacyrl  S-300  chromatographic  column  calibrated  with  a  proteins  of  known  molecular 
weight.  This  experiment  showed  that  all  of  enzymes  arc  separate  proteins.  The  molecular 
weights  of  CPS  and  DHO  arc  160  kDa  and  86  kDa  respectively,  comparable  to  the  size  of  the  E. 
coli  enzymes,  while  the  ATCase  is  much  larger. 

The  material  we  have  isolated  represents  the  purest  active  enzyme  preparation  thus  far 
obtained.  Contrary  to  previous  reports  we  found  that  the  molecule  consists  of  two  subunits, 

34  kDa  and  45  kDa,  which  arc  present  in  approximately  stoichiometric  amounts  in  the 
complex.  We  believe  that  the  catalytic  activity  is  associated  with  the  34  kDa  polypeptide. 
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Preliminary  protein  sequence  data  suggest  that  this  protein  is  homologous  with  the  E.  colt 
catalytic  subunit. 

We  were  concerned  that  these  polypeptides  were  degradation  products  of  a  large 
multidomain  polypeptide  so  we  developed  a  new  autoradiographic  assay  for  assessing 
proteolytic  activity  in  cell  extracts.  The  assay  uses  radlolabelled  CAD  as  a  substrate  (one  of  the 
inter  domain  linker  regions  in  CAD  is  extraordinarily  senstive  to  a  broad  range  of  proteases). 
We  conclude  that  pmteolys's  is  not  occurring  under  the  conditions  that  the  isolation  was 
carried  out. 

Sedimentation  velocity  and  gel  filtration  studies  showed  that  the  enzyme  is  a  stable 
complex  with  a  molecular  weight  of  approximately  480,000  and  a  Stokes  radius  of  77  A.  Thus 
the  enzyme  is  a  dodecamer  composed  of  six  copies  of  each  of  the  two  types  of  subunit.  We  are 
now  attempting  to  dissociate  the  complex  and  isolate  the  constituent  submiits  by  gel  filtration, 
so  that  the  structure  and  function  of  each  polypeptide  can  be  fully  characterized. 

In  summary,  while  confirmatory  experiments  remain  to  be  carried  out,  we  can 
tentatively  draw  the  following  conclusions: 

1)  The  first  three  steps  of  pyrimidine  biosynthesis  in  P.Jlouresccns  arc  carried  by  separate 
proteins,  and  while  the  CPSase  and  DHOase  are  similar  to  the  enzymes  found  in  E.  colt,  the 
ATCasc  is  unusual.  It  is  not  however  a  dimer  of  180  kDa  polypeptides,  but  rather  a 
duodccamer  comprised  of  two  types  of  polypeptide  chain. 

2)  The  catalytic  subunit  is  probably  a  trimer  of  identical  34  kDa  polypeptides  such  as  that 
found  in  other  prokaryotic  and  eukaryotic  ATCases  supporting  the  hypothesis  that  they  have 
all  evolved  from  a  highly  conserved  a  lccstral  domain. 

3)  The  function  of  the  second  polypeptide  is  unknown.  While  it  is  likely  to  be  responsible  for 
regulation,  its  large  size,  nearly  three  limes  the  mass  of  the  E.  coli  enzyme  regulatory  chain, 
suggest  that  it  may  have  other  functions.  Identification  of  these  functions  may  provide  a  clue 
to  the  origin  of  the  highly  specialized  regulatory  subunit  found  in  E.  coli. 

These  studies  (Dcrgh  and  Evans,  manuscript  in  preparation)  confirm  that  the  class  A 
aspartate  transcarbamylascs  arc  very  large  proteins,  but  completely  revise  previous  ideas 
regarding  the  structural  organization  of  these  molecules. 


Archcbactcria 

We  have  begun  to  look  at  the  structural  organization  of  the  genes  encoding  the 
pyrimidine  biosynthetic  enzymes.  These  studies  have  been  aided  by  investigators  that  kindly 
shared  libraries  and  other  materials:  lambda  libraries  of  Methanoacrcina  acclivorcms  (Dr.  Kevin 
Sowers  and  Robert  Gunsalus,  UCLA);  Thcrmoplusma  acidophilum  (Dr.  Charles  Daniels,  Ohio 
State);  cosmid  blots  and  selected  clones  of  Halobacierium  volcanii  (Drs.  W.  Ford  Dooliltlc, 

Robert  Charlcbois  and  Leo  Schalkwyk). 

In  addition  Drs.  John  Reeve  and  Joseph  Krzycki  (Ohio  Slate  University)  have  generously 
sent  us  Mcthanosarcina  barkeri  MS  vectors  and  cells.  We  divised  a  method  for  isolating  the  M. 
barkcri  DNA  after  our  Initial  attempts  resulted  In  massive  shearing  and  used  the  DNA  to 
construct  a  lambda  genomic  library.  Morris  and  Reeve  (J.  Bacleriol.  170:  3125,  1988)  had 
cloned  the  argininosuccinate  synthetase  gene  (argG)  and  found  that  one  of  their  clones 
contained  a  1.2  kb  sequence  that  was  homologous  to  carB,  the  gene  that  encodes  the  large 
subunit  of  E.  coli  CPSase.  We  screened  our  library  using  two  end  labelled  synthetic 


oligonucleotides  complimentary  to  highly  conserved  regions  of  the  CPSasc  as  probes.  Plaque 
hybridization  revealed  two  classes  of  clones.  The  strongest  signals  were  obtained  from  phage 
containing  fragments  of  the  CPSasc  gene  previously  discovered  by  Morris  and  Reeves.  These 
plaques  hybridize  to  both  probes  (xx,  see  diagram  below).  We  have  now  isolated  clones  (eg. 
pREL9  below)  that  encode  all  or  very  nearly  all  of  this  CPSasc  gene  (argB).  Plaques  giving 
weaker,  but  still  definitely  positive,  signals  were  also  subcloncd  fcg.  pRES4  below)  analyzed. 
Restriction  mapping  and  Southern  analysis  showed  that  these  were  derived  from  a  distinctly 
different  gene  which  hybridizes  to  only  one  of  the  two  probes  (x).  Sequencing  studies  now 
underway  are  confirming  that  this  gene  codes  for  an  entirely  separate  CPSasc  which  we  have 
desginated  pyrB. 
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Given  its  close  proximity  to  the  argG  gene  is  is  likely  that  the  CPSasc  discovered  by 
Morris  and  Reeves  is  specific  for  arginine  biosynthesis,  - .  hile  the  CPSase  gene  (pyrB)  we  have 
found  is  the  pyrimdine  specific  enzyme.  Unlike  the  situation  in  E.  colu  the  gene  duplication 
which  gave  rise  to  two  separate  carbamyl  phosphate  synthetases  specialized  for  arginine  and 
pyrimidine  biosynthesis  has  occurred  in  M.  barkeri.  Further  analysis  of  these  clones  should 
give  the  fiist  information  about  the  structural  organization  of  the  pyrimidine  pathway  enzymes 
in  archcbacteria.  Positive  clones  have  also  been  tentatively  identified  in  the  II.  volcanii  and  M. 
acelivorans  libraries,  but  these  have  not  yet  been  characterized. 

Our  efforts  arc  now  being  directed  towards  more  fully  characterizing  the  putative 
pyrimidine  specific  CPSase  and  in  mapping  the  location  of  the  other  pyrimidine  specific 
enzymes  in  Mcthanosarcina  barkeri  and  the  other  archcbacteria. 
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